Boosting in the Limit: Maximizing the Margin of Learned Ensembles
نویسندگان
چکیده
The “minimum margin” of an ensemble classifier on a given training set is, roughly speaking, the smallest vote it gives to any correct training label. Recent work has shown that the Adaboost algorithm is particularly effective at producing ensembles with large minimum margins, and theory suggests that this may account for its success at reducing generalization error. We note, however, that the problem of finding good margins is closely related to linear programming, and we use this connection to derive and test new “LPboosting” algorithms that achieve better minimum margins than Adaboost. However, these algorithms do not always yield better generalization performance. In fact, more often the opposite is true. We report on a series of controlled experiments which show that no simple version of the minimum-margin story can be complete. We conclude that the crucial question as to why boosting works so well in practice, and how to further improve upon it, remains mostly open. Some of our experiments are interesting for another reason: we show that Adaboost sometimes does overfit—eventually. This may take a very long time to occur, however, which is perhaps why this phenomenon has gone largely unnoticed.
منابع مشابه
Speed and Sparsity of Regularized Boosting
Boosting algorithms with l1-regularization are of interest because l1 regularization leads to sparser composite classifiers. Moreover, Rosset et al. have shown that for separable data, standard lpregularized loss minimization results in a margin maximizing classifier in the limit as regularization is relaxed. For the case p = 1, we extend these results by obtaining explicit convergence bounds o...
متن کاملUBoost: Boosting with the Universum
It has been shown that the Universum data, which do not belong to either class of the classification problem of interest, may contain useful prior domain knowledge for training a classifier [1], [2]. In this work, we design a novel boosting algorithm that takes advantage of the available Universum data, hence the name UBoost. UBoost is a boosting implementation of Vapnik's alternative capacity ...
متن کاملBoosting as a Regularized Path to a Maximum Margin Classifier
In this paper we study boosting methods from a new perspective. We build on recent work by Efron et al. to show that boosting approximately (and in some cases exactly) minimizes its loss criterion with an l1 constraint on the coefficient vector. This helps understand the success of boosting with early stopping as regularized fitting of the loss criterion. For the two most commonly used criteria...
متن کاملSupervised projection approach for boosting classifiers
In this paper we present a new approach for boosting methods for the construction of ensembles of classifiers. The approach is based on using the distribution given by the weighting scheme of boosting to construct a non-linear supervised projection of the original variables, instead of using the weights of the instances to train the next classifier. With this method we construct ensembles that ...
متن کاملLarge Margin Distribution Learning
Support vector machines (SVMs) and Boosting are possibly the two most popular learning approaches during the past two decades. It is well known that the margin is a fundamental issue of SVMs, whereas recently the margin theory for Boosting has been defended, establishing a connection between these two mainstream approaches. The recent theoretical results disclosed that the margin distribution r...
متن کامل